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ABSTRACT 


This thesis presents Cost Estimating Relationships (CERs) for fighter aircraft. 
Since the fighter aircraft 1s one of the most important tactical weapon systems, it is 
verv useful to establish CERs solely for fighter aircraft. Using the public data on U.S. 
fighter aircraft, Ordinarv Least Squares (OLS) is used as the primary statistical method 
of establishing CERs. The data collection techniques and adjustments used are 
discussed, and simple and multiple linear regressions are performed on various 
combinations of the explanatory variables. This thesis then shows that CERs based on 
new fighter aircraft data are more reliable than those based on new and old fighter 


eiretalt data. 


(2 


TABEE Oreo Nii aes 


I INTRODUCTION We 0. 8 

A. THESIS OBJECTIY Eee... 2a 8 

B. WHY DO THIS? .....-2 2 eee 9 

C. ORGANIZATION «5.55 eee eee eee 10 

Oe PRIOR FIGHTER ALRCRA F Ge Re i ie 

UE DATA COLLECTION AND ADICST HEN Soy 18 

A. DATA COLLECTION... ee 18 

B. DATA ADIJUS Tee Nor AU 

|. Price-Level Adyustiments-s tem). «crrcee epee sienna 20 

2. Cost-Quantity Adjustments <2. 3.255. see 21 

LEY STATISTICAL APPROACH 23 

A. SIMPLE LINEAR REGRESSION ee 23 

1. Least-Squares Estiniation = .-- 192)2-2 542) ne 24 

2. The Correlation Coefficient]. 75.32 29s 26 

3.. Statistical Inference 295.3 os sacs eee 24 

B. MULTIPLE LINEAR REGRESSIGN Fee ee 31 

1. Ordmary Least-Squares (OURS) Estimattonieee er 32 

2.. The Correlation Coetitetentasees cnt. ee 33 

3. Statistical Inference... 2 ce see eee 34 

ve ANALYSIS OF THE MODELS Wie yee cee 39 

VI. CONCLUSION? sce ei ei 45 

APPENDIX A: AIRCRAFT DAWA er 47 

APPENDIX B: PRICE [NID XN eyy 2 copier aire 48 
APPENDIX C: | SIMPLE AND MULTIPLE DEN Rene Geo 

MODELS. . 2. ie es de 49 


OSI le SS DIUSS OC sh O SIS ee oT) 
LIS CONE RC ETE RIOR ICIS 6 ee oe 2 58 
OE emma LON UST eee ee et eee ec eee anes By 


ti. Ww bho — 


Ca 


LIS) OF TARIEES 


SELECTED CERS FROM THE DAPCA-II IGE. 12 
SELECTED CERS PROM THE LARGE Mi Die Re I4 
SELECTED CERS PROM IDA VI@DE Ly pe 15 
THE CHARACTERISTICS OF [HREESI@ Iya 16 
SEER SISOS OF PREDICTED F-16 COSiS FOR Bile & 17 
SUMMARY OF COST ESTINIA TING SO DE eS 40 
SUMMARY OF COST PREDIC [QRS 44 


PCN Om CEDGEMENTS 


I would like to express my gratitude to my country, the Republic of Korea, and 
particularly the Korean Air Force for allowing me the opportunity to study at the 
Naval Postgraduate School. 

Additionally, I want to give my sincere thanks to Professor Michael Sovereign 
for his advice and willing support in the completion of this work. I also owe mv 
thanks to Professor Dan Boger for his significant and useful suggestions. 

Finally, I want to give a special thanks to my wife, Ok Woon, for her devoted 


assistance during our stay in the United States. 


I. INTRODUCTION 


A parametric cost estimate has been defined as an estimate which predicts cost 
by means of explanatory variables such as performance characteristics, physical 
characteristics, and characteristics relevant to the development process, as derived from 
experience on logically related svstems [Ref. |: p.72]. It 1s based on the assumption 
that the past is somehow a reliable guide to the future, which means the estimation 
captures the relationship between past experience and future application. 

The cost estimation of military hardware uses experience on existing equipment 
to predict the cost of next-generation weapons. Traditionally, acquisition of next- 
generation weapons requires substantial costs. In the past, however, cost was not 
always a major consideration in choosing the Weapons. To save money in the long-run 
and operate within a tighter budget, costs must be reliably estimated during 
requirements formulation in determuning which weapon provides the best value in 
fulfilling mission needs. 

Cost Estimating Relationships (CERs) are mathematical equations which relate 
system costs as a function of various explanatory variables. Thev are most generally 
derived through statistical regression analysis of historical cost data. The construction 
and use of CERs forms the foundation for making independent parametric cost 


estimates | Rere2 e.o) 


A. THESIS OBJECTIVE 

Developing new CERs for fighter aircraft is the major objective of this thesis. In 
fact, there are several cost estimating methods and CERs for aircraft. This thesis will 
discuss the statistical approachs and the CERs for fighter aircraft only using 
explanatory variables such as thrust, weight, etc. 

This thesis also has objectives related to the goal of developing new CERs. They 
are: 

1) To research currently developed CERs based on historical data. There are 
many CERs which were developed in previous periods. They may be used bv 
an experienced analyst and study of them will be helpful to develop new 
CERs. 


2) To present data collection and adjustment approaches. Collecting the right 
data and adjusting the collected data are required in order to develop CERs. 
Data imperfections are frequently encountered difficulties in weapon system 
cost estimation. 

3) To apply alternative statistical methods. CERs that use explanatory variables 
are relied upon to predict the cost at a high level of aggregation. The 
Statistical techniques can be used in a variety of situations, but not for all 
situations. Thev will vary according to the purpose of the study and the 
information available. 

4) To apply CERs. By using newly developed CERs, it may be possible to 
predict the costs of fighter aircraft. Also, it mav be possible to estimate the 


costs of international fighter aircraft from this CER. 


B. WHY DO THIS ? 

Korea (South) knows the misery of war as a result of the Korean War 
(1950-1953) and wishes to live in peace forever. However, North Korea is a belligerent 
communist country. Therefore, as a deterrent to an all-out war, Korea has to have 
high defense capabilities. Maintenance of a strong defense force is one of the most 
reliable wavs to keep the peace. 

Ownership of superior weapon systems is one of the best methods of maintaining 
strong defenses. Fighter aircraft are one of the most powerful weapon systems 
developed for modern warfare. However, fighter aircraft acquisition is extremely 
expensive. Since excessive spending for defense will check national development, the 
choice between systems must be seriously considered. 

Korea 1s still a developing countrv and is currently one of the major weapon 
importing countries. Nevertheless, the economic growth of Korea is worthy of close 
attention. Korea's economy has been growing at an increasing rate for more than 
twenty years. As a result, Korea is now changing from a weapon importing country to 
a Weapon producing country. 

At this time, it could be meaningful to develop new CERs for fighter aircraft. 
CERs are based on readily available explanatory variables, so thev allow the decision 
maker to evaluate the cost impact of future designs and make trade-offs accordingly. 
After acquisition, the potential use of these CERs still exists. They may be used as 
validated CERs the next time. However, since the earlier CERs are out of date in that 


they did not include the newest data, developing new CERs is necessary. 


Korea’s particular interests regarding fighter aircraft are weight, speed, and 
electronic equipment. As a defense force, fighter aircraft must be sufficiently 
lightweight that they can be used quickly to react against attacking aircraft. However, 
as interceptors, fighter aircraft have to have high speed capability and superior 
electronic equipment in order to intercept targets. Therefore, fighter aircraft must be 
lightweight, yet be able to reach speeds of at least mach 2.0, and carry the newest 
superior electronic equipment. Fighter aircraft such as the F-16 or F-18, for example, 


are the most suitable types for Korea. 


C. ORGANIZATION 

Chapter II introduces some of the CERs that have been developed for aircraft. 
Chapter III deals with the data collection and adjustment. Chapter IV concerns the 
statistical approach and includes a discussion of the ordinary least-squares method as a 
regression technique. Chapter V deals with the analvsis of the established models and 
includes a description of the prediction analysis Which estimates the costs of an 
international fighter aircraft from the CERs of U.S. fighter aircraft. Finally, Chapter 


VI offers conclusions regarding the interpretation of selected CERs. 
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Il. PRIOR FIGHTER AIRCRAFT CERs 


As implied earlier, CERs are based on historical data. These CERs are no better 
than the data on which the CERs are based. Therefore, reviewing some of developed 
methods and models may have a beneficial effect on developing new CERs. 

Many organizations have developed cost models, and different techniques have 
been employed. Through the vears the Rand Corporation has organized and updated 
the Department of Defense (DOD) data base for airframe costs, identifving the 
deficiencies and correcting them where possible. mainly in support of Air Force 
sponsored research efforts. 

“A Computer Model for Estimating Development and Procurement Costs of 
Aircraft (DAPCA-IIT)”, which was published in 1976, is one of Rand’s aircraft airframe 
cost models [Ref. 3]. It is based on a sample of twenty-five U.S. military aircraft 
including fighter, attack, bomber, and cargo aircraft. The model uses CERs to estimate 
the development and procurement costs of two major flyaway subsystems of the 
aircraft: airframe and engines. Avionics costs are included in the model but are not 
derived parametrically. These costs, however, do not quite constitute the total svstem 
cost of the aircraft. 

Table 1 shows the CERs used in DAPCA-III. They are based on the cost of 
total production quantity of 200 units including prototype aircraft. For those aircraft 
whose total production quantity 1s less than 200 units, the cost-quantity relationship or 
learning curve is used in order to obtain a value at that quantity. CERs used in the 
model are based on log-linear regressions (they are shown in the power form). The 
major explanatory variables are airframe unit weight and maximum speed at the best 
altitude. Additionally, the time of first flight in calendar quarters after 1942 1s found to 
be a significant explanatory variable for recurring manufacturing labor and materials, 
and improves the statistical properties of the equation. Thus, equations with and 
without the time variable were considered separately. Also, the dummy variable 
designates whether cargo or noncargo aircraft were used for flight test cost. 

Costs are provided in seven categories: total engineering hours, total tooling 
hours, nonrecurring manufacturing labor hours, recurring manufacturing labor hours, 
nonrecurring manufacturing material costs, recurring manufacturing material costs, and 


flight test costs. All costs used in the model are in constant 1975 dollars. 


ial 


TABLE! 
SELECTED CERS FROM THE DAPCA2 Rie Et 


E = 906c27- \y0.6636 ; 50.9871 : r997(6 + 1). Qgbt! ; 1076 

22.39 > \w0. 6214 , sO. OL s09746 + 1) A Qgbt! : 176 
ML:p = 0162597 o) WV0-G2 seo iacmln 
a Ses 5 wes $306 . 60.5464 , 7-0.4711 . agg{b+1). got} spre 
MMaxcp = 0.030614 + W0-7290 . 51.9240. 196 
MMp = 93.409 - \y0-8121 . 60.6951 . 70.4744 . agq-(b +1). gb*! - 1976 
FT = 153.25 ° W0-/095 . 90.5856. Qe 100 - DV7!.5570 . 1976 


where: 
Ee 
ii 


ML \-p = nonrecurring manufacturing labor hours (millions) 


total engineering hours (millions) 


total tooling hours (mullions) 


MLp = recurring manufacturing labor hours (millions) 

MM~-p = nonrecurring manufacturing materials cost (millions of 1975 dollars) 
M™Mp = recurring manufacturing materials cost (millions of 1975 dollars) 

FT = flight test cost (nullions of 1975 dollars) 

W = airframe unit weight (1b) 

S = maximum speed at best altitude (Kts) 

© = altirame quae 

b = exponent corresponding to cumulative average learning curve slope 

T = time of first flight (calendar quarters after 1942 = 4° [input date — 1942.75] ) 
Qrey = number of flight test aircraft 


DV = dummy variable (1 for noncargo, 2 for cargo aircraft) 


DAPCA-III is a meaningful model for use as a long-range planning tool for 
normal, full scale production programs. However, the model is based on a sample of 
several different types of military aircraft. A cost model based on a more 
homogeneous data sample is the result of the work of J. Large. It presents a 


parametric cost model for fighter aircraft onlv [Ref. 4]. 


Larges “A Comparison of Cost Models for Fighter Aircraft’, which was 
published in 1977, is another of Rand’s aircraft cost models and is referred to as the 
Large model [Ref. 4]. It derives CERs to estimate the fighter aircraft cost onlv. There 
are two types of CERs in the model. One is derived from a sample of seventeen U.S. 
nulitarv fighter aircraft onlv, while the other is derived from a sample of thirty-one 
different tvpes of aircraft. The larger sample fighter aircraft data includes several older 
fighter aircraft as well as new fighter aircraft. 

Table 2 shows the CERs based on a sample of fighter aircraft only. They are 
based on cumulative total production quantity of 100 units. Like DAPCA-III. the 
most reliable explanatory variables are airframe unit weight and maximum speed. 
Additionally, the model afforded an opportunity to examine an explanatory variable 
that was thought to have special applicability to fighter aircraft. It is referred to as the 
specific power (P) and represented as 

(static thrust)(max speed) 
SS NDS DE OS 

combat weight 
Both speed and specific power were considered separately along with weight and other 
variables in the regression analyses, for comparison purposes. 

Costs are provided in seven different categories: cumulative total engineering 
hours, cumulative total tooling hours, development support cost, flight test cost, 
cumulative recurring manufacturing hours, cumulative recurring manufacturing 
materials cost, and cumulative recurring quality contro! hours. Then, in order to 
accommodate the less detailed older data, two of the cost categories in DAPCA-III -- 
nonrecurring labor and materials-- are combined into a single categorv, development 
support. All costs used in the model are in constant 1973 dollars. 

The Large model, as a model based on fighter aircraft only, compares the CERs 
iormeienter dircratt with the CERSs for different tvpes of aircraft, and with the CERs 
used in DAPCA-III. However, since the model was published in earlier times, the cost 
information for older aircraft are less reliable than for later aircraft. and the 
development and production experience of these earlier aircraft are not considered an 
appropriate indicator of the future. Furthermore, as in DAPCA-III, CERs used in the 
model make use of subsystem characteristics in order to estimate the costs of airframe, 
engines, etc. Therefore, it would be desirable to develop new CERs which are based on 


recent aitcrait data and make use of overall aircraft characteristics. 


I) 


TAREE 
SELECTED CERS FROM THE PAX Gee ieee 


MOC So iets SS 


E100 
E100 = ().0276° \l.24 , p0.72 
= SS3 ° 70.637 . 0.760 
i oe : FAs a 
poe ~ 2 , 70.715, | 
iim ool p 
ML 199 =u ya yy! Ol ; 50.306 
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: 100 
MMiiog = O.00lIe Galeton as 


MM qq = 0.404 ° W1.23 . p0.567 

DS = 0.00082 Wi 57 = 
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FT = 1.053 Wo??? : pee ee 
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where: 


(TI 
| 


19g = Cumulative total engineering hours at 100 aircraft (thousands) 
Ty 99 = Cumulative total tooling hour at 100 aircraft (thousands) 
ML 


MM go = cumulative recurring materials cost at 100 aircraft (thousands of 1973 dollars) 


gq = Cumulative recurring manufacturing labor hour at 100 aircraft (thousands) 


DS = development support cost (thousands of 1973 dollars) 

FT = flight test cost (thousands of 1973 dollars) 

QC 99 = cumulative recurring quality control hours at 100 aircraft (thousands) 
W = airframe unit weight (lb) 

S = maximum speed (Kts) 

P 
FTA = number of flight test aircraft 


I 


specific power (hp/lb) 


“Cost Estimating Relationships for Tactical Combat Aurcraft”, which was 
published by IDA (Institute for Defense Analvses) in 1984, is one of the most current 
cost models for tactical combat aircraft and is referred to as the [DA model [Ref. 5]. It 


is based on a sample of twenty-six U.S. military aircraft: fighter, attack, bomber 
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aircraft, etc. However, seven fighter and attack aircraft are used to develop the CERs 
for RDT&E (Research, Development, Test and Engineering) cost, and fourteen fighter 
and attack aircraft for procurement cost. 

Table 3 shows CERs used in the IDA model. They are developed to estimate the 
RDT&E and procurement costs of fighter and attack aircraft. To develop the CERs, 
overall aircraft characteristics are used, and this is one of the main features of the 
model. CERs used in the model are based on log-linear regressions. Total production 
quantity of 400 units is selected as the quantity to obtain the costs for the regression. 
The major explanatory variables are DCPR (Defense Contractor's Planning Report) 
weight, thrust DCPR weight, maximum speed at best altitude and IOC (Initial 
Operational Capability) date. DCPR weight is derived from empty weight by use of 
the relationships indicated in Table 3. 

Costs are provided in two categories: total RDT&E cost and cumulative average 
flyaway cost. All costs used in the IDA model are in FY 1985 TOA (Total 
Obligational Authority) dollars. A cumulative average learning curve slope of 0.92 is 


used to adjust the aircraft cost data [Ref. 5: p.5]. 


TABS 
SELEEClEeD GERS FROMUIDA VMODEL 


pe? 18-1101 DEPR*©*” > (THRUSTYDCPR)!” - (1.0239)'0C-"* 
FLY = 0.194 - (DCPR/1000)%-963 « (SP. 100)9-76 - (1.934)!OC-78 


where: 

RD = total RDT&E cost (millions) 

FLY = cumulative average flyaway cost of 400 aircraft (millions) 
THRUST = total maximum thrust at sea level (lb) 

SP = maximum speed at best altitude (kts) 

IOC = initial operational capability date (last two digits of calendar vear) 


DCPR = aircraft Defense Contractor's Planning Report weight (Ib) 


DCPR = 0.0913 -(EW)!!77 for EW > 50000 
DCPR = 0.246 + (EW)!-97 for 10000 S EW S 50000 
DCPR = 13.26°(EW)?®* for EW < 10000 


EW = aircraft empty weight (lb) 


Table 4 shows the sunimarized characteristics of the three models. Since each 


model has its own purpose, the characteristics are different for each model. However, 


it is very interesting that the predicted costs from each model are fairly similar. Table 


5 compares the predicted F-16 costs for these three models. 


TABLE 4 


THE CHARACTERISTICS OF Titi eine ie. fs 


published 
year IES Ise 


sampled 
aircraft several types 


sample 
size tas 


costs of 
CERs subsystem 


major weight, speed, 
variables} time of first 
flight, -diunm,, 


baseline 
quantity cumulative 200 


dt Sare@ne 
Gost 197 S.dollare 


MST 
fignter only 
1 eee) al 


subsystem 


weight, speed, 
specific power 


cumulative 100 


1973 dollars 





1984 


fighter, attack 


overall system 


weight, speed, 
thrust-welight 
ratio, IOC date 


cumulative 400 


1985 dollars 


The predicted costs of DAPCA-III and Large models came from summing up all 


of their subsystem costs. The last page of Large provides a good comparison between 
the DAPCA-III model and Large model of F-16 cost estimates for 100 aircraft. The 


estimates range from 8.867 to 10.356 million dollars, with the total flvaway cost bv the 


[DA model being 9.401 million dollars. The actual total flyaway cost of an F-16 for 


100 aircraft is 9.641 million dollars according to the “US Military Aircraft Cost 
Handbook” [Ref. 6: p.1V-337]. So, the predicted cost from the [DA model ts a better 


prediction than the costs given by the other models. There may be several reasons for 


this result. One of them is that the DAPCA-III and Large models were published 


earlier than the IDA model. Also, it may be that CERs based on the overall aircraft 


system are better than CERs based on the subsystems. 
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NXE 5 
Serie woos OF PREDICTED F-ié COSIS FOR THREE \IODELS 


DAPCA-III model Large model IDA model 


fighter only 31- 























several 
with without with with types RDT&E flyaway 
time time power speed amrerart | .cost cost 
meo39 10.356 HOLOO 4. 86. .8i6/ MO 2 232 Zo eOeZ 9. 401 





mote: 

|. Costs are based on the total production quantity of 100 units 

2. All costs are in constant 1981 dollars (imillions) 

3. For price-level adjustments, price indices in Appendix B were used 
4. Actual cost of an F-16A is 9.641 mullion dollars (Ref. 6: p.1V-337] 
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IW. DATA COLLECTION AND ADJUSTMENTS 


CERs are generally obtained from the statistical analvsis of historical data. Data 
must be collected in order to develop CERs and then adjusted for validity and 
reliability. Acquisition of data is the process of identifving, searching out, obtaining. 
verifving, and recording the specific information that is of value to the analvst. 

The initial step in developing CERs is identifving the aircraft of interest from the 
many tvpes of aircraft such as fighters, bombers, cargo carriers, reconnaissance aircraft, 
helicopters, etc. However, this thesis presents the CERS for fighter aircrait onloueue 


aircraft data used has been collected and adjusted from unclassified sources. 


A. DATA COLLECTION 

Developing reliable CERs, especially for a military application, is very difficult at 
best. Consequently there are many problems with the CERs used for miulitarv 
hardware. The most significant problem with data collection on a military system is to 
obtain complete information from unclassified documents. This has led to data 
anomalies in weapon system cost estimation. 

Early data have not been systematically processed and stored which makes the 
historical information of little value. In an attempt to alleviate this data collection 
problem, the Contractor Information Report (CIR) Program was established by the 
Department of Defense (DOD) in 1966. This reporting system was designed to collect 
costs and related data on major contracts for aircraft and missile and space progranis. 
The CIR was enlarged to cover the other areas of defense contracting “wilt 
implementation of the Contractor Cost Data Reporting System (CCDR). The CCDR 
collects contractor costs and related data needed to satisfy cost estimating 
requirements. In recent vears, The Analvtical Science Corporation (TASC), with the 
assistance of Management Consulting and Research, Inc. (MCR), has been compiling 
data and analyzing the cost versus the effectiveness of tactical aircraft produced since 
19st. 

While collecting data, the levels of accuracy and aggregation should be 
considered in order to develop new CERs. There are two basic categories of data: 
aircraft physical and performance parameters and cost. The sample for this thesis 


consisted of the following aircraft: 


18 


F-4E F-14A F-86F F-104C 


F-6A F-15A F-89D F-105D 
F-8E F-16A F-100D F-106A 
Pao F-18A F-1O1B ales 
F-LIA F-84F F-1L02A 


iiwesmodel developed im this thesisus based on this sample of nineteen U.S. 
fighter aircraft. Since the purpose of this thesis is to provide fighter-based CERs, only 
fighter aircraft data were collected. The parametric data for fighter aircraft were 
obtained from references 7 to 11; however, Jane’s All the World's Aircraft was used 
primarily. fost of the earlier CERs were out of date in that they did not include 
aircraft introduced into the armed forces in the 1970's and 1980's, such as the F-14, 
F-15, F-16 and F-18. [1owever, the data used in this thesis includes the newest fighter 
aircraft. In order to obtain reliable CERs, all the aircraft included in this thesis had 
initial flight dates following 1950. Only one aircraft has been selected from each design 
of fighter aircraft in order to decrease potential miulticollinearity in the data sample. 

The cost data were obtained from the “US Military Aircraft Cost Handbook” 
[Ref. 6]. Thev are based on a cumulative total production quantity of 100 units. so the 
costs presented in Appendix A are the cumulative average total flyawav costs. All 
costs used in this thesis are in constant 1981 dollars. 

The following definitions were developed and used as a basis for determining 
What adjustments would have to be made to the data. Thev are: 

1) Weight : maximum take-off gross weight (lb) 

D Thrust : total naximum engine thrust (Ib) 

3) Speed : maximum speed at best altitude (kts) 

4) Year: vear of initial operational capability 

5) Cost : cumulative Average Costs (CAC) of 100 units for total flyaway cost in 


constant 1981 dollars (millions) 


Like the IDA model. overall aircraft characteristics are used in order to estimate 
the fighter aircraft costs. The major variables for airframe cost are maximum speed at 
best altitude, maximum take-off gross weight and initial operational capability year. 
Some other variables relating to aircraft characteristics (e.g., Wing span, maxinium 
thrust, thrust-weight ratio, etc.) were tried and evaluated but generally were found not 


to be significant. Appendix A shows the total data base used in this thesis. 
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B. DATA ADJUSTMENT 

The distortion of the sample observations used in generating CERs is another 
significant problem encountered with military hardware. The major distortion occuring 
is data normalization. Information collected and reported should be adjusted using 
standardized procedures such as provided by the Cost Accounting Standards Board 
Which establishes consistency in accounting practices among government contractors. 
Standardization has an important effect upon the ability of DOD contracting personnel 
to evaluate proposals and better deternune allocation and allowability of costs. 
Additionally, when using data for different purposes, it is necessary to make different 
adjustments in the data. The two most common adjustments are price-level and cost- 


quantity adjustments. 


1. Price-Level Adjustments 

In order to compare the cost of an old system to the cost of a new system, the 
cost figures must be adjusted to constant dollars. Adjustments are made by means of a 
price index constructed from a time-series of data in which one vear is selected as the 
base and the value for that vear expressed as 100. The other years are then expressed 
as percentages of this base. 

Total Obligational Authority (TOA) dollars in a vear (then-vear dollars) are 
the amounts budgeted in a specific fiscal year. The conversion of TOA dollars to 
constant dollars is accomplished by dividing TOA by a composite index [Ref. 6: 
p.{II-5].. Mathematically, the relationship can be expressed as 

TOA 
Constant Dollars = ———————————- x. 100 
composite index 

Appendix B shows the deflators index and composite indices used by the 
military services (e.g., Army, Navy, Air Force). The composite indices are based on 
the Office of the Assistant Secretary of Defense (OASD), Comptroller, deflator for 
major commodity procurement and service outlay profiles. The tables are based on 
Fiscal Year (FY) 1981 and all index numbers are related to FY81 constant dollars. So 
the composite indices are used to normalize aircraft procurement costs of the respective 
services into FY$1 constant dollars. Multiplication by 100 is required since the index is 


expressed as a percentage. 


As an example of price-level adjustment, calculating the total cost of the F-16 
is represented. According to the Large model, the total cost of an F-16 from the 
fighter sample using specific power is 4.84 million in constant 1973 dollars. The 
composite index of 1973 is 48.38 [Ref. 4: p.15]. Therefore, we can calculate the 


constant 1981 dollars from the values, that is 


4.84 


Constant 1981 Dollars —-—— 
48.38 


x 100 


10.004 (millions) 


2. Cost-Quantity Adjustments 

Learning curves, as cost-quantity relationships, are used in order to develop 
consistent measures of costs. The basis of learning curve theory is that each time the 
total quantity of items produced doubles, the cost per item is reduced to a constant 
percentage of its previous cost. So if the average cost of producing all 200 units is 90 
Pereenit of the average cost of producing the first 100 units, the process follows a 90 
percent cumulative average learning curve. 

The cost-quantity relationships are represented using regression analysis 


techniques assuming the following functional form: 
C= (ens n° or ing e= in(G yet D> Inin) 


Rene: 

In = the natural logarithm function 
C_ = cumulative average cost for quantity n 
n = cumulative production quantity 

C, = the cost of the first unit produced 

b = 


the exponent related to the slope of the learning curve 


iie slope, S, 1S telated to bras 
7 In(S) 
In(2) 


where: 


S = slope expressed as a decimal 
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Therefore. the coefficient b means that when cumulative production doubles. 
cumulative average costs decrease by S percent. 

As an example of cost-quantity adjustment, calculating the total flvaway cost 
of the F-6A is represented. The cumulative average cost of 230 aircraft is 3.584 million 
dollars and 408 aircraft is 3.051 million dollars [Ref. 6: p.]V-278]. So. based upon these 
two points the learning curve slope can be plotted at about 0.84. As implied earlier, 
the equation Which calculates the cost of n aircraft from the cost of the first unit 


produced 1s expressed as 





= eee 
= = C, n 
where: 
7 In(S) 
In(2) 
Therefore, 
In(0.84) 
————— 
In(2) 
= —(0,2351s4 
= . 932¢970.25154 
C439 = ei 230 
= C, - (0.25464) 
Thus, 
3.564 
oF = = 
0.25464 


From this value it is possible to calculate the cumulative average cost of 100 aircraft of 
the F-6A. The cost is 


G = 3.984 : 10070-25154 


100 0.25464 


4.419 (mullion dollars) 


The costs used in this thesis are Cumulative Average Cost (CAC) for quantity 


of 100 units. Each fighter has a different learning curve with a unique slope. 


IV. STATISTICAL APPROACH 


CERs are developed from the historical cost of systems and the explanatory 
variables of those svstems. Therefore, some variables which are logically and 
theoretically related to cost have to be selected in order to develop reliable CERs. An 
important characteristic of reliable CERs is that the relationship between cost and 
explanatory variables must be direct and obvious. 

Regression analysis can be applied as a statistical technique to develop CERs 
from the historical cost and parametric data. Regression analvsis is primarily 
concerned with the deternunation of the equation of a line or curve which will predict 
how the dependent variable will vary with respect to some independent variables. 
Therefore, regression analysis will estimate the coefficients of the equation, (e.g., 
intercept and slopes) and infer the reliability and significance of the results of the 
estimate. (Johnstons Econometric Methods [Ref. 12] 1s the source of all facts and 
derivations shown 1n this chapter.) 

Generally, there are two types of linear regression models, simple and multiple. 
The difference between these two models is the number of variables in the equation. 
The simple linear regression model has only two variables, while the multiple linear 


regression model has more than two variables. 


A. SIMPLE LINEAR REGRESSION 

The equation used in simple linear regression has two variables, cost and an 
explanatory variable. This means that the cost is expressed as a linear function of an 
explanatory variable. Thus, as an example of the simple linear regression model, the 


linear relationship is 
y=@ortPpxt+u 


where: 

v = the dependent (cost) variable 

= the independent (explanatory) variable 
= the intercept of the line 


tae slope of the line 


Cc MW RK 
I 


error term between the actual cost and expected cost of y 


Additionally, the log-linear regression model is very frequently used as another 
method of expressing the linear model. The log-linear equation results from taking 


logarithms of both sides of the linear equation, and 1s written as 


y = et xB. et or In(v) = "@ Aa peainiegeeeen 


~ 


Thus, this equation graphs as a linear relationship when plotted in terms of In(x) and 
In(y). 

There are some assumptions made with regards to the error term. The first 
assuniption is that the error term is normally distributed with zero mean and variance 


3} fe 
Gs thhal 4s 
os >) 
Uo so Ni ORo a 


The second assumption is that the error term for different x values are independent and 


identically distributed. 


1. Least-Squares Estimation 
As implied earlier, the simple linear regression model has some unknown 
parameters: @, B, and o-. Those unknown parameters have to be estimated in order to 
establish CERs. The least-squares is the most frequently used method for estimating 
the unknown parameters. 
By using the simple linear regression model, the actual cost of the svstem 1s 


indicated bv 


yi = ee 


Where y, 1s the actual cost of the ith observation. Then, any straight line drawn 


through the scatter of data points may be regarded as an estimate of the hypothesized 


relationship y = @ + Bx + u. A straight line is indicated by 
so 


where y_ indicates the value of the line at any given value of x. 
The principle of the least-squares is that the unknown parameters are selected 


tO nuninuze the sum of squared residuals. This nunimization is expressed as 
: » 
min Le.” 


Under this principle, the unknown parameters are determined as 
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_ =(X; oe ar uF i Y) 


=o 
=(X; Ne) 


The difference between the actual cost and the expected cost is defined as the residuals 


Which is written as 
. — nie ies —_—_ Lie dj b oe 


Where e; is the residual of the ith observation. Also, under the minimization principle, 


; ; = 7 eee - 
the unbiased estimator for o” is deternuned as 


: 
2 =e 

o) = 
if — 


The following are some properties of the least-squares. First, the expected 
values of the parameters a and b are exactly same as the values of @ and B. It is 


indicated by 
Efa] = a@ and Efb] = B 


Thus, a and b, as the least-squares estimators in a simple linear regression model, are 
unbiased estimators for a@ and f. Secondly, the least-squares estimators have the 
minimum variances among all linear unbiased estimators. Asa result, the least-squares 
estimators for a and b are called the best linear unbiased estimators [Ref. 13: p.473]. 
The minimum variances property is the major reason why least-squares is so frequently 
emploved in estimating unknown parameters. 

By using the least-squares, some simple linear regression models are obtained. 
Then, the log-linear function can be selected as the best simple linear model. An 


example is 


sO to or Ince) = In(O.172) + 1.230° In(T) 


and rewrite the model as 


Ce See o0es 1.250 1 
where: 
C = total flyaway cost of fighter aircraft in constant 1981 dollars (millions) 


if 


total maximum engine thrust (1b) 


2. The Correlation Coefficient 
The selected model must be examined in order to determine the reliability or 
accuracy of that equation. There are several statistical measures that can indicate the 
goodness of fit of the equation in describing data. R? is the most commonly used 
measure of the goodness of fit and is defined as the coefficient of determination which 
comes from squares of the correlation coefficient (R). The computing of R° is as 


follows: 


2 Explained sum of squares 


Total sum of squares 


Residual sum of squares 
Total sum of squares 
ee. 


Sy Y)° 


=|- 


R? is the proportion of the total deviation which can be explained by the regression 
model, and corresponds to all data points which lie on the regression line. The highest 
possible value of R? is 1.00 and the lowest is 0.00. 


Dy c . ° 
The value of R- from the log-linear regression model above 1s 


5 
-_ 


R* = 0.7007 


Which is a relatively low value. It means that thrust alone does not explain all of the 
variance in the cost data. It also means that the log-linear model above does not fit 
the data well. Usually, there exists two ways to increase R7 to a relatively high value. 
iniey are: 
1} To add some other variables into that equation. Adding variables may explain 
the remaining variance. This will be discussed in Section B below. 
2) To find other equations. If the simple linear regression model does not fit the 
data Well, then multiple linear regression models with other variables may fit 


the data better. 
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3. Statistical Inference 
As implied earher, the hypothesized relationship between the dependent 


variable (v) and independent variable (x) may be indicated by 
Wh ae [Sel 


Where u is an error term. Under this relationship, the least-squares method produces 


unbiased estimators a and b. Thus, the outcomes of a least-squares regression line is 


ve = at bx 


Standard statistical techniques can be applied to the least-squares result to test for 
significance and to make inferences about reliability and accuracv in a probabilistic 


SCHISE. 


a. t-test 
It is necessary to test the relationship between y and x. This is done by 
establishing the null hypothesis that y and x are not related to each other. and the 


alternative hypothesis that y and x are related to each other: 


H hz 0 


1: 
These hypotheses are the most frequently used, and are referred to as testing the 


significance of x. Bv a similar development, tests on the intercept are 


The test that is commonly used for this purpose is known as the t-test 


because the tests on the @ and B are based on the t distribution. It follows then that: 


a =e 
S 


a 


=) tn) 








i = od a i(n=2) 
Sp 


where: 





S, = the standard error of a 
= Bra 
ae x) 
S, = the standard error of b 
_ Ss 
a 
S = the standard error of regression 
Se? 


Pe ome 


If the sample t statistic is numerically greater than the preselected critical 
value of t, We accept the alternative hypothesis and conclude that x plays a significant 
role in the determination of v. The following values result from the least-squares 


regression line based on the data in Appendix A. They are 


a= — 1.76044 
b = 1.230396 

S, = 0.586718 
S, = 0.195015 


Since n= 19, from the t distribution with 17 degrees of freedom. 
Wp Bele 
Thus, the intercept is significantly different from zero since 
ta = | — 3.000 [ee 000 ee 


Also, the slope is significant since 


th = 6.309 > 2.110 


b. Confidence Interval 
Exanuning the confidence intervals for @ and $ is another way to test the 
significance of the unbiased estimators a and b. Since a confidence interval which 
includes zero 1s equivalent to accepting the null hypothesis that the true value of the 
parameter i$ zero, an interval which does not include zero is equivalent to rejecting the 
null hypothesis. 


Generally, 100(1 — p) percent confidence intervals for @ and B are indicated 
by 


« 


ie) — eas 


Cp) =b+t,.°S, 


P 


where Sa and Sp are time slancand errors Of 2 and Db. 


A 95 percent confidence interval for @ is then 
Chie lero) (2,110 < 0,587) 
Or a Ue 2 tO. aes 098 
Also, a 95 percent confidence interval for B is 
CI(p) = 1.230 + (2.110 X 0.195) 
or 0.189 to 1.642 


Therefore, the fact that the confidence intervals for @ and B do not include zero means 
that the null hvpotheses are rejected, and the unbiased estimators a and b are 


statistically significant. 


c. F-test 
The analysis of variance (ANOVA) test is merely a significance test on B 
performed in another way, and is referred to as the F-test. The F statistic is the ratio 


of the mean square due to x over the residual mean square. Thus, it is indicated by 


Fl i 2) 
Le*;(n— 2) \ . 


a, 


The significance of x is thus tested by examing whether the sample F exceeds the 
appropriate critical value of F taken from the upper tail of the F distribution. 
Therefore, the test procedure is that if the value of F 1s greater than the value of F(1, 
n= 2), themmeject Gis ay am 

Usually, the F-test will be apphed extensively in multiple linear regression 
models. However, in simple linear regression models, the F variable with (1.k) degrees 
of freedom is the square of a t value with k degrees of freedom. The relationship 


° : : ° ° ° ~ oe % 
between the t and F distributions can be explained with the correlation coefficient, R°. 


leis 
R (Hise) 
t = — 
J1—R- 
a R? 1 Ss 
(1— R?) (n—- 2) 


The ANOVA for the least-squares regression line based on the 19 


observations in Appendix A is as follows: 


Degrees of Sumee £ Mean 
source freedom square square 


Thrust it. 2se7ce Lila Z23e 760 


Residual 4.799673 0.282334 
Total 16. 038440 





Since n= 19, using the F distribution with | and 17 degrees of freedom, 





Fy gs(1.17) = 4451 
The sample F statistic 1s 
eee 
F = = 39.807 > 4.451 
0.282 


Thus, Hy: B = 0 rejected. It means that the intercept is not zero. 
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B. MULTIPLE LINEAR REGRESSION 

In the previous section. the linear relationship between cost and thrust was 
examined as a simple linear regression model. It was selected as the best model using 
two variables, and the relationship was represented with log-linear function. However, 
its low R° means that using a model with onlv one independent variable, thrust, cannot 
Sigeciessiutation well, tIherefore, some other models which have more than one 
independent variables have to be examined. 

Multiple linear regression models have more than one independent variable. 
Thus, the vector of sample observations on the dependent variable (Y), mav be 
expressed as a linear combination of the sample observations on the independent 
variables (X) and the vector of the error term (u). An example of the hypothesized 


multiple linear regression model is represented as 


leet peed) ete) 


Y = the vector for dependent (cost) variable 
A, = the unit vector for an intercept 
X. = the vector for independent variables (1 z 1) 


unknown parameters 


oc wT 
I 


the vector of error terms 


Each vector is a column vector of n elements. The multiple linear regression model 


may also be expressed in matrix form as 
Y= pea 


where Y and u are nX | matrices, X is an X k matrix, and B is a k X 1 matrix. 
Like the simple linear regression model, there are some assumptions made for the 
multiple linear regression model. They are: 
1) The u vector has a multivariate normal distribution, with each u distribution 


5 : We x . 
having a zero mean Vector and the same variance vector (o~). [hat is 
S Wie 
Ul ee INOS oe 


where I is the identity matrix. 


2)  X isa nonstochastic matrix and its rank is k. That is 
p(X) = k 


a 


1. Ordinary Least-Squares (OLS) Estimation 
As implied earlier, the hypothesized multiple linear regression model and a 


vector of the straight line are indicated by 


Y=XP+u 
eS XE 


where b is k element vector. Thus, a vector of errors or residuals can be defined as 
e = Y — Xb 


The principle of the least-squares is that b 1s selected to minimize the sum of 


the squared residuals, ee. Under thisspniiemsle, Db isidetennuimedeas 


b = (oe 


B + (XX)! x’u 

Then the variance-covariance matrix of the OLS estimators 1s 
. 2pyeuy-l 
var(b) = o7(X'X) 


where the elements on the main diagonal of this matrix give the sampling variances of 
the corresponding elements of b, and the off-diagonal terms give the sampling 
covariances. 

Since the expected value of b is exactly the same as the value of B, the OLS 


estimators are linear unbiased estimators. This 1s indicated by 
E[b] = B 


Also, since the OLS estimators have the minimum sampling variances among all of the 
linear unbiased estimators, b is the best linear unbiased estimator (b.l.u.e). Using the 


OLS, two equations are selected as the best models. They are 


C — 7015635 + 0215 Wa. OF35 sn 
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Ce = — 3994 618 + O1G8eNy ae0 lay 
where: 
W = (maximum take-off gross weight), 1000 
Y = vear of initial operational capability 


or) 
to 


The former model is based on the 19 data points in Appendix A, while the 
latter 1s based on only 6 data points. However, the 6 data points used in the latter 
model have an initial operational capability vear of 1965 or after. It means that the 
latter model is based on the relatively new aircraft data. Thus, the 6 data points 


contained in the latter model are 


Fe4E Fel4A FelSA 
E-16A Fe1l8A FelllA 


2. The Correlation Coefficient 
The correlation coefficient is the most commonlv used measure of the 
goodness of fit. Then, the multiple correlation coefficient for the k-variable is defined 


as 


Explained sum of squares 


R? = 
Total sum of squares 
Residual sum of squares 
Total sum of squares 
ee 
= | - ——_ 
Tea 
where: 
A =1-—- (Lnjw 
I = identitv matrix 
t = acolumn vector of n units 


Y’AY = the sum of squared deviations in Y 


The value of R* from the selected multiple regression model based on the 19 


observations 1S 
a od 
R io 0.7504 


Although this is a slightly higher value than that of the simple regression model, the 
value is still relatively low. It means that the weight and vear do not explain all of the 


Variance in the cost data, and the model does not fit the data well. 


ee 


yp ‘ . 
However, the value of R~ from the selected multiple regression model based 
on the 6 observations 1s 


R?, = 0.9441 


This is a relatively high and good value, thus the weight and vear variables fit the 6 
data points. 

The value of R? adjusted for degrees of freedom is useful when comparing 
different numbers of independent variables, and is referred as the adjusted R*. The 
adjusted R° is defined as 


5 ee (n-k) 


R = —_—_—_—_—_— 
YAY (n-1) 


Thus, the values of adjusted R? for the selected simple and multiple regression models 
are 


R* = 0.6831 
2a0= 

R79 = 0.7192 

R*, = 0.9068 


Therefore. comparing the results of the adjusted R° shows that they are almost same as 
those of R?. 


3. Statistical Inference 


The characteristices of the multiple linear regression models were already 


mentioned at the beginning of this chapter. According to them, b 1s indicated by 
: 2evew\-l 
bi~ N@pGe Nae) 
Then, the variance of error term, as an estimator of O°, is defined as 


> ce 


and S is a standard error of the regression. 
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a. t-test 
SilccwO 1S (NC mestliMdlcdecoctiicient matrix of X, b. is the estimated 
coefficient of X. in the OLS regression. b is distributed independently of S*. Thus the 
t-test of the multiple linear regression is determined as 
Oateisb: 
t= —-+——-_ +> _~t(n-k) 
So vais 
Where a.. denotes the ith element on the principal diagonal CONOR 
Hvpotheses are established about B, where the null hypothesis is Ho : ph = 
Q and the alternative hypothesis 1s an : B z 0. Then, the t statistics of the selected 


multiple linear models are as follows: 


Based on 19 obs. Based on 6 obs. 


Intercept 
Weight 





Year 


Ifn = 19, from the t-distribution with 16 degrees of freedom, 


ty gos( 16) = 2.120 


and ifn = 6, with 3 degrees of freedom, 


tr are(3) = 3.182 


0.025 


Thus, since all of the t statistics based on the selected models are greater than their 


critical values, the coefficients are not zero. 


b. Confidence Interval 
The 100(1—p) percent confidence intervals for the coefficients of Weight 


(X,) and Year (X,) are indicated by 


Cy) = By SS OS 
Glipayeerby to 5; 


where S, and S, are standard errors of b, and b,. 


2 


The following values are indicated in the least-squares regression lines 


based on the data in Appendix A. They are as follows: 


Based on 19 obs. Based on 6 obs. 





Thus, the 95 percent confidence intervals for B, and B, based on the 19 observations 


are 

C1(p,) = 0.215 += (2.120 X 0.061) 
Or 0.086 to 0.343 

C1(p,) = (2.358 = (2.120 X 0.125) 
or 0.0935 sro mOnor. 


Also, the confidence intervals based on the 6 observations are 
CI(B,) = 0.688 + (3.182 x 0.097) 

or ORE Se eco. USN 
Clb.) = 2:0 1se Sears) 

or 0.979 to 3.047 


Therefore, the fact that all of the confidence intervals do not include zero means that 


b, and b, based on the 19 and 6 observations are statistically significant. 


c. F-test 
The t-test is usually used to test the significance of a single coefficient. 
However, when added to the function of the t-test, the F-test can be used to test the 
significance of the complete regression and the significance of a subset of coefficients. 
Thus, the F-test of the multiple linear regression will be a very useful and powerful tool 


for testing the independent variables, X. 


In order to test the elements of J, the linear hypothesis is established as 
RB =r 


where R is q Xk matrix with rank gq, and r is a q element vector. Therefore, if the 


linear hypothesis is true, the following is obtained 
(Rb-r) ~ N(O, 67R(X’X) ER’) 
Thus, the F statistic under the linear hypothesis is 


— (Rb—-r)[R(X’Xy RT (Rb= rq 


CeCe) 
ecm — Kk) 


In order to test the joint significance of Weight (X,) and Year(X,), the null 


hypothesis is established as 


Then, the F statstic for this hypothesis can be indicated by 


_ Explained sum of squares (k — 1) 


Residual sum of squares (n— k) 


( Yeeey —aeié)'(k — 1) 


© cain kK} 
weiuseine F Statistic based on the 19 observations 1s 


810.264 (3-1) 
269.490 (19 — 3) 


24.053 
Since n= 19, from the F distribution with 2 and 16 degrees of freedom, 


Ie 2,16) = 3.634 < 24.053 


0.951 


Therefore, HI): B, = PB, = 0 is rejected. It means that even though the sample R° is 


numerically tow, the model is significant. 


ai 


Also, the F statistic based on the 6 observations is 


300.204 (3-1) 
17.773 (6-3) 


= oo 
Since n= 6, from the F distribution with 2 and 3 degrees of freedom, 
Fo gs5(2+9) = 9 5525= 7-225) 


Thus. Hy: B, = B, = 0 is also rejected. This model is therefore significant with a 


numerically high Re 


V. ANALYSIS OF THE MODELS 


The reliable CERs will accurately predict the costs of systems, provided they are 
suitable for that particular system. Thus, in order to establish reliable CERs, the 
previous chapter demonstrated use of regression methods on the simple and nuultiple 
linear regression models performed on various combination of the explanatory 
variables contained in Appendix A. Then, some models were selected as desirable for 
predicting costs of fighter aircraft using least-squares estimation. However, many 
alternative models were discarded because of statistical problems. Appendix C 
illustrates use of simple and multiple linear regression models for various combinations 
of the explanatory variables. 

For selecting reliable models, approximately 1000 models were estimated. 
Models were evaluated using from one to eight explanatory variables. The summary of 


these models is presented below: 


—_— 19 observations 6 exis 
ieee? 3 Sees appendix ¢€ 
Variables 
4~ 5 
Variables 
ce es Statistically 
variables unsatisfactory 


Then, in order to check how the models fit the data, the selected models were 











Statistically unsatisfactory 






evaluated with several statistical measures: the coefficient of determination Re the 
adjusted coefficient of determination (Rey, StanG@anae crrOre(oE). € Statistics (t), 
Somigence Imtenvals (1), and F Statistics (F). However, since no single statistic can be 
a meaningful indication of the models’ applicability, the models’ statistics must be 
looked at together. Table 6 shows a summary of the cost estimating models developed. 
The table includes the selected equations, the results of the statistical measures, and the 
correlation matrices of the estimated coefficients in order to aid in analyzing the 


models. 
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TABI 
SUMMARY OF COST ES TPSTATI NGS TOES 


A. Simple linear regression model based on 19 observations 
In(C) = In(0.172) + 1.230 ° In{T) 


Roan 7007 Ro nerogs Se — 0a 
t(b1}) = — 3.000 t(b2) = 6.309 
Cl(b2) = 0.189 to che 
F = 39.807 
CORRELATION MATRIX OF ESTIMATES 
INTERCEP LPWR 
NESE? 3:9989 9.8583 


B. Multiple linear regression model based on 19 observations 


C = —701.635 + 0.215W + 0.358Y 
Re = 1780s R=) SE = ales 
(Chl) == 2579 t(b2) = Bse5 t(b3) = 2.864 
Cl(b2) = 0.086 to 0.343 Cl(b3) = 0.093 to 0.623 
F = 24.053 
CORRELATION MATRIX OF ESTIMATES 
INTERCEP WT Baas. 
INTERCEP 1. 0000 0. 5662 -1. 0000 
wr 0: 5662 1: 0000 -0.5731 
YEAR -1. 0000 -0.5731 1: 0000 


C. Multiple linear regression model based on 6 observations 


C = —3994.618 + 0.688W + 2.013Y 

R? = 0.9441 R? = 0.9068 SE = 2.434 
t( bl) =" Sons t(b2) = 7.088 t(b3) = 6.194 
Gl(p2) = 01379100. 9er CI(b3) = 0.979 to 3.047 
eS 5,0 S 1 


CORRELATION MATRIX OF ESTIMATES 


INT ERGEE wal YEAR 
JN Was (Gide 1. COGo =OSO250 =-1. 0000 
WT -0. 8239 je S18 1018. 0. 6209 
YEAR -1. 0000 One2 09 1 006 
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Additionally, characteristics other than statistical measures should be considered 
in analyzing the models. Some of them are: 

1) The signs and the magnitudes. Usually, cost 1s expected to increase with 
thrust and weight. Additionally, since the new aircraft contain modular 
avionics which are easily updated (e.g., radar, electronic equipments, etc.), the 
cost of the new aircraft 1s expected to increase with vear of initial operational 
capability. Therefore, the developed models containing the positive 
coefficients for thrust, weight, and year make sense. 

2) The constant term. The developed multiple linear regression models contained 
large negative constant terms. This means that the developed multiple linear 
regression models would not be valid over the full range of possible values of 
the independent variables. 

3) The correlation matrix. The correlation matrices are included in the table to 
aid in determining the multicollinearities that may exist between the various 


independent variables in the models. 


Table 6 shows that all of the t statistics are greater than their critical values, and 
the confidence intervals do not include zero. This means that all of the unbiased 
estimators of the developed models are significantly different from zero. Furthermore, 
since all of the F statistics are greater than their critical values, the developed models 
are significant. 

However, the multiple linear regression miodel based on 6 observations which has 
an initial operational capability vear following 1965 contains desirable values of the 
coefficient of determination (R7), 0.9441, and the coefficient of determination adjusted 
for degrees of freedom (R), 0.9068. This indicates that the equation based on 6 
observations fits the data well because the dependent variables, weight and year, 
explain the variance in the cost data. 

Also, Table 6 shows that the multiple linear regression models based on 19 
observations contains a large value of standard error (SE) which is a measure of the 
dispersion of the data and relates to the prediction intervals. It indicates that the 
multiple linear regression model based on 19 observations does not have the desirable 
prediction intervals. Therefore, the multiple linear regression model based on 6 


observations is selected as a desirable fighter aircraft CER. 


4] 


Initially, the data base contained a large number of international fighter aircraft, 
but many observations were eliminated because of insufficient information. As such, 
only 19 observations were chosen. Since models using 19 observations were 
statistically unsatisfactory, a small subset of 6 observations was selected from the 
original 19. Then, the Chow test [Ref. 12: p.207-225] was performed on models with 6 
observations and related with the other 13 observations (1.e., comparisons were made 
to determine if both data sets came from the same population of fighter aircraft). 
Appendix D shows the test results which indicates that the two groups of data are not 
from the same population. The 6 observations are representative of current fighter 
aircraft, and should provide the best estimates of future fighter aircraft costs. 

Since the purpose of CERs 1s to estimate the cost of systems, by substituting the 
parameters of the proposed system into the CERs, it will be possible to estimate the 
cost of the svstem. There are two kinds of prediction: a point prediction and an 
interval prediction. If the obtained equation fits the data well, then a good prediction 
will be possible. However, it is very unlikely that the point prediction will be realized. 
Therefore, a prediction interval should be constructed in order todescribemure 
uncertainty of the estimates. 

Point prediction is obtained by substituting the values of dependent variables into 
the selected equation. As implied earlier, the selected multiple linear regression model 


based on 6 observations 1S 


C, = —3994.618 + 0.688W + 2.013Y 


Thus, since the value of weight (W) and year (Y) of the F-16A are 35.4 and 1978, the 


selected regression equation gives the point estimate of an F-16A as follows: 


c 


— 3994.618 + 0.688(35.4) + 2.013(1978) 
11.917 


6 


Also, the following formula is used to construct a 100(1—p) percent prediction 
interval (PI) for the point estimate. It includes the standard error (SE) and indicates as 
follows: 

=e ek / yl? 
PI= Y, = th SE Pr REY OER 
where si is the point forecast, X is the matrix of data base with the first column of 


units, and R 1s the vector of proposed system's parameters. 


Therefore, a 95 percent prediction interval for F-16A based on 6 observations ts 


Y. = 11.917 


70402. 8289 -8.7208 -35. 4297 
coe = -8.7208 0.0016 0.0044 
-35.4297 0.0044 0.0178 


Ree Rh — 70519 


Thus, 
Peete ly = 2 120 (2434) 7 1,519 
= tPoly + 6.360 
or Deo etOu nl looz 7a, 


Up to this point we have seen some reasons to believe that the multiple linear 
regression model based on 6 observations will give a better estimate of fighter aircraft 
than more broadly based models. However, in order to aid in comparing the selected 
models, Table 7 shows a summary of the cost predictions. It includes the cost 
predictions of F-16A and F-IS8A. 

As a result, the table verifies that the multiple linear regression model based on 6 
observations gives a better estimate than those of the other models. This means that, 
since the 6 observations are new fighter aircraft with a initial operational capability 
vear following 1965, a model based on new aircraft data may correctly predict the cost 
of a new fighter aircraft. 

The cost used in this thesis is cumulative average costs of 100 units for total 


flvaway cost in 1981 millions of dollars. 


TABLE 
SUMMARY OF COST PREDIG@HIG NN: 


Actual cost 9.641 23.968 


Point 









Prediction 


3.800 ~ 393300 
1.817 27 sec 
13.2860 =Seee aS 


~OF Zone gu 
4.145~24.909 
ape ieauellicr. 4/7) 












Prediction 

















Intervals 






Silgeade: 

Sig = simple linear regression model based on 19 observations 
My = multiple linear regression model based on 19 observations 
M, = multiple linear regression model based on 6 observations 
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VI. CONCLUSION 


This thesis presented a regression model of a CER for fighter aircraft. It is based 
on 19 fighter aircraft because the major objective of this thesis is developing CERs for 
fighter aircraft only. 

As implied earlier, there are many CERs for aircraft. They are very useful for 
developing new CERs but are different from each other. The differences mostlv 
depend upon the aircraft types, the included aircraft data, and the statistical methods 
used. However, even though thev are different fron: each other, their results are 
sinular. This means that since the purpose of CERs is to provide a reasonable cost 
estimation of systems, they give simular estimates of a particular aircraft. 

As a result of this thesis, a multiple linear regression model based on 6 
observations is selected as the best model to estimate the costs of fighter aircraft. 
Then, it is a very meaningful result because the 6 observations are new fighter aircraft 
with an initial operational capability vear following 1965. There may be several reasons 
for this result such as the limited data base of the model or the applied statistical 
methods. But the most reasonable cause of the result is the characteristics of the data. 
Traditionally, everv new fighter aircraft requires large development costs. Also, it 
includes developed systems such as radar, electronic equipments, armament systems, 
eros ncoubtedily, tose systems are verv expensive. However, those characteristics 
usually were not considered as the explanatory variables. This means that a model 
based on old technology may incorrectly estimate the cost of a new system containing 
advanced technology. Therefore, in order to estimate the costs of modern or future 
fighter aircraft, CERs should be based on new aircraft data. 

There were some difficulties in developing CERs for fighter aircraft. The data 
problem was the first and most difficult problem. Sufficient numbers of observations 
can support the distribution assumptions and reduce the standard error. Thus, CERs 
based on sufficient numbers of observations may give better confidence or prediction 
intervals because these are functions of the standard error. However, since the fighter 
aircraft data used in this thesis were verv limited, it caused quite a wide standard error 


and wide confidence or prediction intervals. 


Sinularly, accuracy of the data is very important. Inaccurate data is worthless 
because it cannot lead to relable CERs. Thus, under such conditions, it is very hard 
to expect accurate estimates. However, some explanatory variables of new fighter 
aircraft data were classified such as the maximum speed of F-ISA. But, the selected 
models were not very good Grins: 

Additionally, as a statistical method, OLS has some problems. OLS is almost 
exclusively the selected regression technique. I[t is based on the assumptions that the 
error term 1s normally distributed, and the estimates are selected to minimize the sum 
of the squared deviations of actual cost observations from their estimates. However, 
OLS as a regression method 1s quite sensitive to outlying observations. If the data 
base includes some unusual observations then it tends to give a poor result. Thus, 
there is a tendency to discard those observations that seem to lie outside a normal 
trend line in order to remove a possible bias in the estimating equation. 

Finally, further study and developments of CERS for fighter aircraft should 
consider the following: 

1) Use accurate and sufficient data. The purpose of this study is to get reliable 
CERs which gives an accurate cost estimate of the systems. This is possible 
by using accurate data. Furthermore, sufficient data can reduce the standard 
error so that it gives accurate confidence and prediction intervals, because they 
depend upon the standard error. 

2) Use alternate methods. OLS is the most frequently used estimating technique 
for CERs, but it is not a perfect technique by itself. Thus, it is neededmia 
support and compare the established CERs, but alternate methods, such as 


generalized least squares or least absolute value regression, will also do that. 


Additionally, in order to estimate the costs of modern or future systems, it 1s 
important also to suggest that the new data should be added to the model and old ones 
removed. That enables the model to be kept updated and restricted to fighter aircraft 


With similar characteristics. 
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APPENDIX A 
AIRCRAFT DATA 
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APPENDIX B 
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APPENDIX C 
SIMPLE AND MULTIPLE LINEAR REGRESSION MODELS 
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3) Models with 1 variable 
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APPENDIX D 
CHOW TEST 


H, : The models based on 6 and 13 observations came from the same population. 
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ie 2S (cele (8) Osis: 


ogee Rss, + RSS3))\(n = 2k) 
F = _USSS ae Ur ea pane ok) 
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Seas, 
Meee 17.728 P0055) 019 = 26) 
CUR TAOS 22 GSS S13) 


= 7.859 


Then, the critical value 1s 


Geeta 3-415 = 7.859 


Therefore, the null hypothesis that the model based on 6 and 13 observations came 


from the same population is rejected. 
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